Last week, UNESCO published a report named: COVID-19 AND SCHOOL CLOSURES - ONE YEAR OF EDUCATION DISRUPTION. They point out that Latin America and the Caribbean is home to 3 out of 5 children who lost an entire school year worldwide. In other words, they wrote that the region accounts for almost 60 per cent of all children who missed an entire school year due to COVID-19 lockdowns across the world, according to new data released today by UNICEF.
So, I searched for the original databases and found only three available. 1) Date, ISO, Country and Status 2) UNICEF Region, Average days closed weighted by number of students and Type 3) Country, Income Group, Days: Academic break Days: Fully closed, Days: Fully open, Days: Partially closed, Instruction Days, Number of students.
To find a new angle I began investigating the databases doing the following:
import pandas as pd
import numpy as np
pd.set_option("display.max_columns", 200)
pd.set_option("display.max_colwidth", 200)/Users/biancapallaro/.pyenv/versions/3.8.2/lib/python3.8/site-packages/pandas/compat/__init__.py:120: UserWarning: Could not import the lzma module. Your installed Python is incomplete. Attempting to use lzma compression will result in a RuntimeError.
warnings.warn(msg)
#Import database
df = pd.read_csv("covid_impact_education.csv")
df.head(15)| Date | ISO | Country | Status | Note | |
|---|---|---|---|---|---|
| 0 | 16/02/2020 | ABW | Aruba | Fully open | NaN |
| 1 | 16/02/2020 | AFG | Afghanistan | Fully open | NaN |
| 2 | 16/02/2020 | AGO | Angola | Fully open | NaN |
| 3 | 16/02/2020 | AIA | Anguilla | Fully open | NaN |
| 4 | 16/02/2020 | ALB | Albania | Fully open | NaN |
| 5 | 16/02/2020 | AND | Andorra | Fully open | NaN |
| 6 | 16/02/2020 | ARE | United Arab Emirates | Fully open | NaN |
| 7 | 16/02/2020 | ARG | Argentina | Fully open | NaN |
| 8 | 16/02/2020 | ARM | Armenia | Fully open | NaN |
| 9 | 16/02/2020 | ATG | Antigua and Barbuda | Fully open | NaN |
| 10 | 16/02/2020 | AUS | Australia | Fully open | NaN |
| 11 | 16/02/2020 | AUT | Austria | Fully open | NaN |
| 12 | 16/02/2020 | AZE | Azerbaijan | Fully open | NaN |
| 13 | 16/02/2020 | BDI | Burundi | Fully open | NaN |
| 14 | 16/02/2020 | BEL | Belgium | Fully open | NaN |
#Look at the types
df.dtypesDate object
ISO object
Country object
Status object
Note float64
dtype: object
#Convert date to datetime.
df['Date'] = pd.to_datetime(df['Date'], format='%d/%m/%Y')df.dtypesDate datetime64[ns]
ISO object
Country object
Status object
Note float64
dtype: object
#Import altair
import altair as alt
from vega_datasets import dataalt.data_transformers.disable_max_rows()DataTransformerRegistry.enable('default')
I wanted to create an overview graph about how governments began closing schools and moving to distance learning. On March 11th, 2020, when the World Health Organization declared the novel coronavirus (COVID-19) outbreak a global pandemic, only 26 schools worldwide were closed due to covid-19. By March 31st, almost 170 countries had moved to distance learning. This situation begins to change in May, where a decline in the number of countries observing full school closures is accompanied by an increase in the number of countries where schools are partially or fully open. In July, more than 100 countries started academic break, and eventually in September, at least 80 decided to resume in person classes and only 40 were completely shut down, Today, only 27 are fully closed.
alt.Chart(df).mark_area().encode(
alt.X('Date'),
y = alt.Y('count()'),
color = 'Status',
tooltip = ('count()', 'Date', 'Status')
).properties(
width=850,
height=300
)#Make graph by continents. #Import data by region
df3 = pd.read_csv("region.csv")
df3.head(10)| UNICEF Region | Average days | Type | |
|---|---|---|---|
| 0 | Latin America and Caribbean | 158 | Fully closed |
| 1 | South Asia | 146 | Fully closed |
| 2 | Eastern and Southern Africa | 101 | Fully closed |
| 3 | Middle East and North Africa | 90 | Fully closed |
| 4 | West and Central Africa | 77 | Fully closed |
| 5 | Eastern Europe and Central Asia | 59 | Fully closed |
| 6 | East Asia and Pacific | 56 | Fully closed |
| 7 | Western Europe | 52 | Fully closed |
| 8 | North America | 0 | Fully closed |
| 9 | Global | 95 | Fully closed |
The highest average number of days when in-person classroom instruction was disrupted is seen in Latin America and the Caribbean region, followed by South Asia, and Eastern and Southern Africa. Schools in Latin America and the Caribbean remained shut down for 158 days from March 2020 to February 2021, longer than the global estimate (95 days). Schools in Latin America and the Carribean stayed fully open only 6 days last year and in South Asia 7. While the global average is 37 days.
alt.Chart(df3).mark_bar().encode(
alt.X('Average days'),
alt.Y('UNICEF Region'),
alt.Color('Type'),
tooltip = 'Average days',
order=alt.Order(
'Type',
sort='ascending'
)
).properties(
width=700,
height=300
)#Import new database
df3 = pd.read_csv("days_students.csv")
df3.head()| ISO3 | UNICEF Country | UNICEF Region | Income Group | Days: Academic break | Days: Fully closed | Days: Fully open | Days: Partially closed | Instruction Days | Pre-primary | Primary | Lower Secondary | Upper Secondary | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | AFG | Afghanistan | South Asia | Low income (L) | 32 | 115 | 55 | 33 | 203 | 24,220 | 6,544,906 | 1,982,869 | 1,081,020 |
| 1 | AGO | Angola | Eastern and Southern Africa | Lower middle income (LM) | 0 | 139 | 9 | 87 | 235 | 784,381 | 5,620,915 | 1,525,954 | 508,196 |
| 2 | AIA | Anguilla | Latin America and Caribbean | NaN | 62 | 20 | 93 | 60 | 173 | 434 | 1,646 | 637 | 422 |
| 3 | ALB | Albania | Eastern Europe and Central Asia | Upper middle income (UM) | 77 | 41 | 92 | 25 | 158 | 81,026 | 170,861 | 148,810 | 120,062 |
| 4 | AND | Andorra | Western Europe | High income (H) | 50 | 77 | 105 | 3 | 185 | 2,204 | 4,325 | 2,985 | 1,528 |
df3.shape(200, 13)
#This a covid-19 database from The New York Times that contains the total number of cases per million.
df4 = pd.read_csv("owid-covid-data.csv")
df4.head()| Countries | ISO3 | Total cases per million | Total deaths per million | |
|---|---|---|---|---|
| 0 | Afghanistan | AFG | 1435.355 | 62.962 |
| 1 | Albania | ALB | 39467.649 | 679.686 |
| 2 | Algeria | DZA | 2608.421 | 68.824 |
| 3 | Andorra | AND | 143260.208 | 1449.557 |
| 4 | Angola | AGO | 642.239 | 15.670 |
df4.shape(194, 4)
#I wanted to see the relationship between the covid-19 cases and school closures so I merged the two databses.
#How can I see the 6 countries that are not in the database?
new_table = pd.merge(df3, df4, on="ISO3")#I created a scatter polot to analyze the the realtionship between covid-19 cases and school closures
interval = alt.selection_interval()
chart1 = alt.Chart(new_table).mark_point().encode(
x = 'Days: Fully closed',
y = 'Total cases per million',
color = alt.condition(interval, 'UNICEF Region', alt.value('lightgray')),
tooltip = 'Countries',
).properties(
selection = interval
).properties(
width=750,
height=300
)
chart1#I sorted the data by days fully closed days because I thought Altair would graph it in that same order. But it didn't...
df4 = df3.sort_values(by='Days: Fully closed', ascending=False)
df4.head()| ISO3 | UNICEF Country | UNICEF Region | Income Group | Days: Academic break | Days: Fully closed | Days: Fully open | Days: Partially closed | Instruction Days | Pre-primary | Primary | Lower Secondary | Upper Secondary | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 138 | PAN | Panama | Latin America and Caribbean | High income (H) | 23 | 211 | 1 | 0 | 212 | 95,481 | 418,852 | 200,934 | 121,979 |
| 158 | SLV | El Salvador | Latin America and Caribbean | Lower middle income (LM) | 30 | 205 | 0 | 0 | 205 | 230,010 | 662,740 | 308,565 | 213,011 |
| 16 | BGD | Bangladesh | South Asia | Lower middle income (LM) | 33 | 198 | 4 | 0 | 202 | 3,578,384 | 17,338,100 | 8,497,398 | 7,372,422 |
| 23 | BOL | Bolivia (Plurinational State of) | Latin America and Caribbean | Lower middle income (LM) | 40 | 192 | 1 | 2 | 195 | 353,898 | 1,379,099 | 445,168 | 788,570 |
| 24 | BRA | Brazil | Latin America and Caribbean | Upper middle income (UM) | 34 | 191 | 1 | 9 | 201 | 5,101,935 | 16,106,812 | 13,414,172 | 9,704,007 |
Of course this is only a first approach. I still don't have a clear angle because I also want to make a scatter plot comparing number of days schools were closed with: a) Internet access b) Income c) Number of students (in millions) who have missed class. So there is still a lot I want to do.